1 Overview

Our research question is: how to audit whether labels are “conceptualized consistently” across different countries and different languages. We propose a framework to audit labels themselves for world-wide inclusivity. Here, we are not concerned with a specific ML implementation, and instead audit the labels that are being fed into systems.

Figure 1 is descriptive statistics of our survey.

  1. We consider country comparisons in three ways:
  • Within each country, how much “internal” agreement is there on labels of interest? (Figure 2a)
  • How do different populations view these labels in specific contexts? (Figure2b)
  • If label outcome differences are found across countries, is there a data-backed explanation relating to culture? (Figure3)
  1. We consider language comparison in three ways:
  • How similar are label choices for bilingual speakers when asked questions in their native language versus English? (Figure4)
  • How similar are label choices for English-speakers across English-speaking countries? (Figure5)
  • How similar are label choices for speakers of non-English language? (Figure6)

Figure 5 and 6 should look similar to Figure 2b using similar approach and have not been plotted currently in this doc.

bookdown::serve_book()

2 Figure 1: Discriptive Statistics

source("read.R")

head(df)
## # A tibble: 6 × 17
##   Response.ID    game  country GENDER   AGE ageCut native Langu…¹ region exper…²
##   <chr>          <chr> <fct>   <chr>  <int> <fct>  <fct>  <chr>   <fct>    <dbl>
## 1 US_English_No… PUBG  US      Male      45 45-55  Engli… English US           5
## 2 US_English_No… PUBG  US      Male      45 45-55  Engli… English US           5
## 3 US_English_No… PUBG  US      Male      45 45-55  Engli… English US           5
## 4 US_English_No… PUBG  US      Male      45 45-55  Engli… English US           5
## 5 US_English_No… PUBG  US      Male      45 45-55  Engli… English US           5
## 6 US_English_No… PUBG  US      Male      45 45-55  Engli… English US           5
## # … with 7 more variables: hardcore <fct>, accessibility <fct>, labeler <chr>,
## #   ambassador <chr>, label <fct>, answer <int>, gender <chr>, and abbreviated
## #   variable names ¹​Language, ²​experience
## # ℹ Use `colnames()` to see all variable names

2.1 Participant breakdown by gender, ageCut, hardcore, accessibility, country, game

# 5,991 participants in total
nrow(df %>% distinct(Response.ID))
## [1] 5991
# gender, age, experience, accessibility (disability) breakdowns
# gender: Male, Female, and Nonbinary. 
df %>% distinct(gender, Response.ID) %>% count(gender)
## # A tibble: 3 × 2
##   gender        n
##   <chr>     <int>
## 1 Female      432
## 2 Male       5497
## 3 NonBinary    62
# ageCut: 18-24, 25-34, 35-44, 45-55, 55+. Per Xbox internal age breakdowns.
df %>% distinct(ageCut, Response.ID) %>% count(ageCut)
## # A tibble: 6 × 2
##   ageCut     n
##   <fct>  <int>
## 1 18-24   1182
## 2 25-34   1620
## 3 35-44   1641
## 4 45-55    773
## 5 55+      111
## 6 <NA>     664
# hardcore: hardcore (play games over 1 time a week), non-hardcore (play games 1 time or less)
df %>% distinct(hardcore, Response.ID) %>% count(hardcore)
## # A tibble: 2 × 2
##   hardcore         n
##   <fct>        <int>
## 1 hardcore      5608
## 2 non-hardcore   383
# accessibility: or really disability, Yes/No. Those people who incated that they have disability or not
df %>% distinct(accessibility, Response.ID) %>% count(accessibility)
## # A tibble: 4 × 2
##   accessibility            n
##   <fct>                <int>
## 1 No                    5466
## 2 Prefer not to answer   112
## 3 Yes                    328
## 4 <NA>                    85
# countries 
df %>% distinct(country, Response.ID) %>% count(country)
## # A tibble: 16 × 2
##    country          n
##    <fct>        <int>
##  1 US            3040
##  2 Germany        245
##  3 Poland         123
##  4 Greece         319
##  5 Japan          237
##  6 Korea          165
##  7 Singapore       69
##  8 India           99
##  9 Saudi.Arabia    35
## 10 South.Africa   212
## 11 Nigeria        170
## 12 Brazil         280
## 13 Argentina      182
## 14 Colombia       165
## 15 Chile          193
## 16 Mexico         457
# games
df %>% distinct(game, Response.ID) %>% count(game)
## # A tibble: 11 × 2
##    game                              n
##    <chr>                         <int>
##  1 Animal Crossing: New Horizons  1102
##  2 Call of Duty: Vanguard         1718
##  3 Candy Crush                     799
##  4 Elden Ring                     1896
##  5 FIFA22                         1123
##  6 Grand Theft Auto V             3168
##  7 Mario Kart 8                   1651
##  8 Minecraft                      2631
##  9 PUBG                           1241
## 10 Stardew Valley                 1043
## 11 The Sims 3                      429

2.2 Label correlations

  • In this plot, we look at correlation between labels from our sample. We can observe that difficulty, violent, action, action.motivation, control_complexity, strategy, learning_curve tend to go together. Comedic, creativity, pacifist, zen,made.for.kids, cozy are clustered together.
library(corrplot)
## corrplot 0.92 loaded
# disregard the NA
res <- df %>% 
  filter(!(label %in% c("NA.positive.opinion", "NA.negative.opinion", "NA.feeling", "NA.art"))) %>%
  mutate(label=factor(label, levels=unique(df$label))) %>%  # keep order in the final graph Figure 1
  select(Response.ID, label, answer) %>% 
  pivot_wider(., names_from = label, values_from = answer) %>%
  unnest() %>% 
  select(-Response.ID)

mx <- cor(res, use = "complete.obs")
corrplot(mx, type = "upper", order = "FPC", tl.cex=2,
         tl.col = "black", tl.srt = 45, title="Figure 1: Label Correlation")

3 Figure 2a: Consider country comparisons in three ways

This step is optional. We remove games where a country has very few response (< 5). In other words, we find games with >=5 participants across all countries.

# filter out countries with few data point
# for each game, remove countries with few data points
constraints <- df %>% count(game, country, label) %>%  # group by country and label 
  select(game, country, n) %>% distinct() %>%
  filter(n < 5)

# remove games that have too few response in any country
df <- df %>% filter(!(game %in% unique(constraints$game))) %>%
  filter(!(label %in% c("NA.positive.opinion", "NA.negative.opinion", "NA.feeling", "NA.art")))

# game included
unique(df$game)
## [1] "PUBG"                   "Grand Theft Auto V"     "Minecraft"             
## [4] "Elden Ring"             "FIFA22"                 "Call of Duty: Vanguard"

3.1 Within each country, how much “internal” agreement is there on labels of interest?

We seek to answer questions such as does the concept of a “cozy game” elicit more agreement within the US than it does in Japan? To achieve so, we calculate the average standard deviation/variance of label outcomes and comparing across countries. We plot standard deviation to reduce the length of this document.

Code Description: plotFigure2a is a wrapper function that plots all 28 label output. The first three are 1-3 scale, 1-4 scale for difficulty the latter 24 are 0-1 scale. plotVariance is the plotting function.

# wrapper function that enable individual game
plotFigure2a <- function(game = "All", sd=FALSE, debug=FALSE) {
  var.df <- df %>% group_by(game, country, label) %>%
    summarise(mean_answer = mean(answer, na.rm=TRUE),
          variance = var(answer, na.rm=TRUE),
          sd=sd(answer, na.rm=TRUE))
  
  if(game == "All" && !sd) {
    # average the variances across games for each country
    country.df <- var.df %>% group_by(country, label) %>% 
      summarise(y=mean(variance, na.rm=TRUE))
  } else if (game == "All" && sd) {
    country.df <- var.df %>% group_by(country, label) %>% 
      summarise(y=mean(sd, na.rm=TRUE))
  } else if(game != "All" && !sd) {
    country.df <- var.df %>% filter(game == {{ game }}) %>% mutate(y=variance)
  } else if (game != "All" && sd) {
    country.df <- var.df %>% filter(game == {{ game }}) %>% mutate(y=sd)
  } else {
    print("Error...")
  }
  
  p1 <- plotVariance(country.df, gameTitle=game, sd=sd, binary=FALSE)
  p2 <- plotVariance(country.df, gameTitle=game, sd=sd, binary=TRUE)
  res <- ggarrange(p1, p2, ncol=1, heights = c(1, 6))
  return(res)
}

# plot variance of a label across countries
plotVariance <- function(inputDf, gameTitle="All", sd=FALSE, binary=FALSE) {
  # color the US
  inputDf$color <- inputDf$country == "US"
  
  plotDf <- inputDf[with(inputDf, order(label, country, y)),] %>% 
      filter(!label %in% c("control_complexity", "learning_curve", "difficulty", "replayability")) 
      
  if(!binary) {
    plotDf <- inputDf[with(inputDf, order(label, country, y)),] %>% 
      filter(label %in% c("control_complexity", "learning_curve", "difficulty", "replayability"))
  } 
  if(sd) {
    yLabel <- "Standard Devaition"
  } else {
    yLabel <- "Variance"
  }
  p <- plotDf %>% ggplot(aes(x=reorder_within(country, y, label), y=y, color=color)) +
        geom_point() + 
        scale_x_reordered() +
        facet_wrap(~label, ncol=4, scales = "free") +
        ylim(c(0, 3)) +
        scale_color_manual(values=c("#999999", "#56B4E9")) +
        labs(y = yLabel, x = NULL,
             title = paste0("How much internal agreement is there on labels of interest (", gameTitle, ")")) 
  if(binary) {
    p <- p + ylim(c(0, 1))
  }
  return(p)
}

3.1.1 Taken all games together.

Here we look at the standard deviation across all the games. We look at all the games together. (e.g., Does the concept of a “cozy game” elicit more agreement within the US than it does in Japan regardless of games?) The US is highlighted in the graph in Figure 2a.

We can observe that Saudi Arabia with the highest variability (i.e., standard deviation) 11 times, South Africa 2 times, Nigeria 7 times, Singapore 4 times, Japan 1 time, India 1 time, Mexico 1 time, and Colombia 1 time.

plotFigure2a("All", sd=TRUE, debug=FALSE)

3.1.2 Breakdown by game

Here we look at the variance/standard deviation in individual games. We look at all the games together. (e.g., Does the concept of a “cozy game” elicit more agreement within the US than it does in Japan in specific games?)

Note that not in one time did the US has the highest standard deviation.

3.1.2.1 PUBG

plotFigure2a("PUBG", sd=TRUE, debug=FALSE)
## `summarise()` has grouped output by 'game', 'country'. You can override using
## the `.groups` argument.
## Scale for 'y' is already present. Adding another scale for 'y', which will
## replace the existing scale.

3.1.2.2 Grand Theft Auto V

plotFigure2a("Grand Theft Auto V", sd=TRUE, debug=FALSE)
## `summarise()` has grouped output by 'game', 'country'. You can override using
## the `.groups` argument.
## Scale for 'y' is already present. Adding another scale for 'y', which will
## replace the existing scale.

3.1.2.3 Minecraft

plotFigure2a("Minecraft", sd=TRUE, debug=FALSE)
## `summarise()` has grouped output by 'game', 'country'. You can override using
## the `.groups` argument.
## Scale for 'y' is already present. Adding another scale for 'y', which will
## replace the existing scale.

3.1.2.4 Elden Ring

plotFigure2a("Elden Ring", sd=TRUE, debug=FALSE)
## `summarise()` has grouped output by 'game', 'country'. You can override using
## the `.groups` argument.
## Scale for 'y' is already present. Adding another scale for 'y', which will
## replace the existing scale.

3.1.2.5 FIFA22

plotFigure2a("FIFA22", sd=TRUE, debug=FALSE)
## `summarise()` has grouped output by 'game', 'country'. You can override using
## the `.groups` argument.
## Scale for 'y' is already present. Adding another scale for 'y', which will
## replace the existing scale.

3.1.2.6 Call of Duty: Vanguard

plotFigure2a("Call of Duty: Vanguard", sd=TRUE, debug=FALSE)
## `summarise()` has grouped output by 'game', 'country'. You can override using
## the `.groups` argument.
## Scale for 'y' is already present. Adding another scale for 'y', which will
## replace the existing scale.

3.2 Within each region, how much “internal” agreement is there on labels of interest?

In this case, we group countries to higher-level regions based on the World Value Survey map. We perform the same analysis but on a regional level.

# wrapper function that enable individual game
plotFigure2aRegion <- function(game = "All", sd=FALSE, debug=FALSE) {
  var.df <- df %>% group_by(game, region, label) %>%
    summarise(mean_answer = mean(answer, na.rm=TRUE),
          variance = var(answer, na.rm=TRUE),
          sd=sd(answer, na.rm=TRUE))
  
  if(game == "All" && !sd) {
    # average the variances across games for each region
    region.df <- var.df %>% group_by(region, label) %>% 
      summarise(y=mean(variance, na.rm=TRUE))
  } else if (game == "All" && sd) {
    region.df <- var.df %>% group_by(region, label) %>% 
      summarise(y=mean(sd, na.rm=TRUE))
  } else if(game != "All" && !sd) {
    region.df <- var.df %>% filter(game == {{ game }}) %>% mutate(y=variance)
  } else if (game != "All" && sd) {
    region.df <- var.df %>% filter(game == {{ game }}) %>% mutate(y=sd)
  } else {
    print("Error...")
  }
  
  p1 <- plotVariance(region.df, gameTitle=game, sd=sd, binary=FALSE)
  p2 <- plotVariance(region.df, gameTitle=game, sd=sd, binary=TRUE)
  res <- ggarrange(p1, p2, ncol=1, heights = c(1, 6))
  return(res)
}

# plot variance of a label across countries
plotVariance <- function(inputDf, gameTitle="All", sd=FALSE, binary=FALSE) {
  # color the US
  inputDf$color <- inputDf$region == "US"
  
  plotDf <- inputDf[with(inputDf, order(label, region, y)),] %>% 
      filter(!label %in% c("control_complexity", "learning_curve", "difficulty", "replayability")) 
      
  if(!binary) {
    plotDf <- inputDf[with(inputDf, order(label, region, y)),] %>% 
      filter(label %in% c("control_complexity", "learning_curve", "difficulty", "replayability"))
  } 
  if(sd) {
    yLabel <- "Standard Devaition"
  } else {
    yLabel <- "Variance"
  }
  p <- plotDf %>% ggplot(aes(x=reorder_within(region, y, label), y=y, color=color)) +
        geom_point() + 
        scale_x_reordered() +
        facet_wrap(~label, ncol=4, scales = "free") +
        ylim(c(0, 3)) +
        scale_color_manual(values=c("#999999", "#56B4E9")) +
        labs(y = yLabel, x = NULL,
             title = paste0("How much internal agreement is there on labels of interest (", gameTitle, ")")) 
  if(binary) {
    p <- p + ylim(c(0, 1))
  }
  return(p)
}

3.2.1 Taken all games together.

plotFigure2aRegion("All", sd=TRUE, debug=FALSE)

4 Figure 2b: How do different populations view these labels in specific contexts?

To answer this question, we calculate differences in estimated mean label outcomes for each demographic using causal matching analysis. To predict the probability of having a game label in a certain country, we use multilevel regression and post-stratification (MRP).

# Import libraries
source("read.R")
library(ggalt)
library(reshape2)

options(dplyr.summarise.inform = FALSE)

4.1 Calculate differences in estimated mean label outcomes for each demographic using causal matching analysis

We first check the differences between means for each labels across countries below. We run the following analyzeCountry function to each label and explores whether there is a significant difference between US and non-US participants in our matched dataframe.

analyzeCountry <- function(inputDf, label, lower=1, upper=3) {
  df <- inputDf %>% filter(label == {{ label }}) %>% 
    mutate(is_us = ifelse(country == "US", 1, 0)) %>% 
    mutate(gender.no.nonbinary = ifelse(gender == "Male", 1, 0)) %>% # depending on how we want to deal with this
    mutate(gender.no.nonbinary = factor(gender.no.nonbinary)) %>%
    na.omit() 
  
  # causal matching using matchit
  m.out <- matchit(is_us ~ game + ageCut + hardcore + gender.no.nonbinary, method = "nearest", distance = "mahalanobis", link = "probit", replace = TRUE, data=df)
  
  # optional for debugging  
  s.out <- summary(m.out, standardize = TRUE)
  plot(s.out)
  
  matched.df <- match.data(m.out) 
  matched.df$is_us <- factor(matched.df$is_us)
  model <- lm(answer ~ is_us, data=matched.df)
  print(summary(model))

  annotator <- df %>% filter(labeler == "Yes" & label == {{ label }}) %>%
    count(game, answer) %>%
    group_by(game) %>%
    summarize(majority.vote = mean(answer[which(n==max(n))])) %>% ungroup()
  
  p <- plotByGame(matched.df, "is_us", label, LOWER=lower, UPPER = upper) +
    geom_point(aes(x=game, y=majority.vote, color="Polish Annotator",  shape="Polish Annotator"), size=2, data=annotator) +  # majority.vote is the Polish annotator majority vote
    labs(title=paste0(label, " Score by Country"), y="Mean Score", x="Games") +
    scale_color_discrete("Country")
  return(p)
}
  • In the remainder of this section, we look at each individual label. For each label, we can look at whether there is a significant difference between US and non-US participants. For example, learning curve, replaybility, zen, space, violent, action, comedic, grinding, anime, motivation:action, motivation:social, motivation:immersion are significantly different.

4.1.1 Control Complexity

## 
## Call:
## lm(formula = answer ~ is_us, data = matched.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.6644 -0.6257  0.3356  0.3743  1.3743 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.66443    0.05624  29.597   <2e-16 ***
## is_us1      -0.03875    0.05673  -0.683    0.495    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6865 on 8669 degrees of freedom
## Multiple R-squared:  5.384e-05,  Adjusted R-squared:  -6.151e-05 
## F-statistic: 0.4667 on 1 and 8669 DF,  p-value: 0.4945

4.1.2 Learning Curve

## 
## Call:
## lm(formula = answer ~ is_us, data = matched.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.7651 -0.6227 -0.6227  0.3773  1.3773 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.76510    0.06333  27.871   <2e-16 ***
## is_us1      -0.14236    0.06388  -2.228   0.0259 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7731 on 8669 degrees of freedom
## Multiple R-squared:  0.0005725,  Adjusted R-squared:  0.0004572 
## F-statistic: 4.966 on 1 and 8669 DF,  p-value: 0.02588

4.1.3 Difficulty

## 
## Call:
## lm(formula = answer ~ is_us, data = matched.df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.90894 -0.90894  0.09106  0.10067  2.10067 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1.899329   0.076427  24.852   <2e-16 ***
## is_us1      0.009613   0.077092   0.125    0.901    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9329 on 8669 degrees of freedom
## Multiple R-squared:  1.793e-06,  Adjusted R-squared:  -0.0001136 
## F-statistic: 0.01555 on 1 and 8669 DF,  p-value: 0.9008

4.1.4 Replayability

## 
## Call:
## lm(formula = answer ~ is_us, data = matched.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.4363 -0.4363  0.5637  0.5637  0.7047 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.29530    0.05902  38.892   <2e-16 ***
## is_us1       0.14098    0.05953   2.368   0.0179 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7204 on 8669 degrees of freedom
## Multiple R-squared:  0.0006465,  Adjusted R-squared:  0.0005313 
## F-statistic: 5.608 on 1 and 8669 DF,  p-value: 0.0179

4.1.5 Pacifist

## 
## Call:
## lm(formula = answer ~ is_us, data = matched.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.2013 -0.1641 -0.1641 -0.1641  0.8359 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.20134    0.03039   6.626 3.65e-11 ***
## is_us1      -0.03730    0.03065  -1.217    0.224    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3709 on 8669 degrees of freedom
## Multiple R-squared:  0.0001708,  Adjusted R-squared:  5.544e-05 
## F-statistic: 1.481 on 1 and 8669 DF,  p-value: 0.2237

4.1.6 Made for kids

## 
## Call:
## lm(formula = answer ~ is_us, data = matched.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.3557 -0.3202 -0.3202  0.6798  0.6798 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.35570    0.03824   9.301   <2e-16 ***
## is_us1      -0.03547    0.03858  -0.920    0.358    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4668 on 8669 degrees of freedom
## Multiple R-squared:  9.754e-05,  Adjusted R-squared:  -1.781e-05 
## F-statistic: 0.8456 on 1 and 8669 DF,  p-value: 0.3578

4.1.7 Cozy

## 
## Call:
## lm(formula = answer ~ is_us, data = matched.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.3044 -0.3044 -0.3044  0.6956  0.7248 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.27517    0.03768   7.302 3.08e-13 ***
## is_us1       0.02922    0.03801   0.769    0.442    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.46 on 8669 degrees of freedom
## Multiple R-squared:  6.817e-05,  Adjusted R-squared:  -4.718e-05 
## F-statistic: 0.591 on 1 and 8669 DF,  p-value: 0.4421

4.1.8 Zen

## 
## Call:
## lm(formula = answer ~ is_us, data = matched.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.2773 -0.2773 -0.2773  0.7227  0.7920 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.20805    0.03662   5.681 1.38e-08 ***
## is_us1       0.06923    0.03694   1.874    0.061 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.447 on 8669 degrees of freedom
## Multiple R-squared:  0.000405,   Adjusted R-squared:  0.0002897 
## F-statistic: 3.512 on 1 and 8669 DF,  p-value: 0.06096

4.1.9 Fantasy

## 
## Call:
## lm(formula = answer ~ is_us, data = matched.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.3340 -0.3340 -0.3340  0.6660  0.6711 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.32886    0.03864   8.511   <2e-16 ***
## is_us1       0.00510    0.03898   0.131    0.896    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4716 on 8669 degrees of freedom
## Multiple R-squared:  1.975e-06,  Adjusted R-squared:  -0.0001134 
## F-statistic: 0.01712 on 1 and 8669 DF,  p-value: 0.8959

4.1.10 Space

## 
## Call:
## lm(formula = answer ~ is_us, data = matched.df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.08054 -0.01877 -0.01877 -0.01877  0.98123 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.08054    0.01141   7.061 1.78e-12 ***
## is_us1      -0.06176    0.01150  -5.368 8.15e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1392 on 8669 degrees of freedom
## Multiple R-squared:  0.003313,   Adjusted R-squared:  0.003198 
## F-statistic: 28.82 on 1 and 8669 DF,  p-value: 8.151e-08

4.1.11 Heroic

## 
## Call:
## lm(formula = answer ~ is_us, data = matched.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.2537 -0.2537 -0.2537  0.7463  0.8054 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.19463    0.03560   5.467 4.69e-08 ***
## is_us1       0.05907    0.03591   1.645      0.1    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4345 on 8669 degrees of freedom
## Multiple R-squared:  0.000312,   Adjusted R-squared:  0.0001967 
## F-statistic: 2.706 on 1 and 8669 DF,  p-value: 0.1

4.1.12 Real World

## 
## Call:
## lm(formula = answer ~ is_us, data = matched.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.3624 -0.3545 -0.3545  0.6455  0.6455 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.362416   0.039197   9.246   <2e-16 ***
## is_us1      -0.007922   0.039538  -0.200    0.841    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4785 on 8669 degrees of freedom
## Multiple R-squared:  4.631e-06,  Adjusted R-squared:  -0.0001107 
## F-statistic: 0.04014 on 1 and 8669 DF,  p-value: 0.8412

4.1.13 Violent

## 
## Call:
## lm(formula = answer ~ is_us, data = matched.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.4066 -0.4066 -0.4066  0.5934  0.8054 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.19463    0.04012   4.851 1.25e-06 ***
## is_us1       0.21196    0.04047   5.237 1.67e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4898 on 8669 degrees of freedom
## Multiple R-squared:  0.003154,   Adjusted R-squared:  0.003039 
## F-statistic: 27.43 on 1 and 8669 DF,  p-value: 1.669e-07

4.1.14 Action

## 
## Call:
## lm(formula = answer ~ is_us, data = matched.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.4703 -0.4703 -0.4703  0.5297  0.6778 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.32215    0.04085   7.886  3.5e-15 ***
## is_us1       0.14816    0.04121   3.596 0.000325 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4986 on 8669 degrees of freedom
## Multiple R-squared:  0.001489,   Adjusted R-squared:  0.001374 
## F-statistic: 12.93 on 1 and 8669 DF,  p-value: 0.0003253

4.1.15 Emotional

## 
## Call:
## lm(formula = answer ~ is_us, data = matched.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.2081 -0.1907 -0.1907 -0.1907  0.8093 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.20805    0.03221   6.460  1.1e-10 ***
## is_us1      -0.01737    0.03249  -0.535    0.593    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3931 on 8669 degrees of freedom
## Multiple R-squared:  3.298e-05,  Adjusted R-squared:  -8.237e-05 
## F-statistic: 0.2859 on 1 and 8669 DF,  p-value: 0.5929

4.1.16 Comedic

## 
## Call:
## lm(formula = answer ~ is_us, data = matched.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.4488 -0.4488 -0.4488  0.5512  0.6309 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.36913    0.04073   9.063   <2e-16 ***
## is_us1       0.07971    0.04109   1.940   0.0524 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4972 on 8669 degrees of freedom
## Multiple R-squared:  0.000434,   Adjusted R-squared:  0.0003187 
## F-statistic: 3.764 on 1 and 8669 DF,  p-value: 0.0524

4.1.17 Experimental

## 
## Call:
## lm(formula = answer ~ is_us, data = matched.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.1745 -0.1632 -0.1632 -0.1632  0.8368 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.17450    0.03029   5.760 8.69e-09 ***
## is_us1      -0.01127    0.03056  -0.369    0.712    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3698 on 8669 degrees of freedom
## Multiple R-squared:  1.57e-05,   Adjusted R-squared:  -9.966e-05 
## F-statistic: 0.1361 on 1 and 8669 DF,  p-value: 0.7122

4.1.18 Strategy

## 
## Call:
## lm(formula = answer ~ is_us, data = matched.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.4488 -0.4488 -0.4488  0.5512  0.5571 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.442953   0.040750  10.870   <2e-16 ***
## is_us1      0.005885   0.041105   0.143    0.886    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4974 on 8669 degrees of freedom
## Multiple R-squared:  2.365e-06,  Adjusted R-squared:  -0.000113 
## F-statistic: 0.0205 on 1 and 8669 DF,  p-value: 0.8862

4.1.19 Grinding

## 
## Call:
## lm(formula = answer ~ is_us, data = matched.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.5116 -0.5116  0.4884  0.4884  0.6510 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.34899    0.04092   8.528  < 2e-16 ***
## is_us1       0.16262    0.04128   3.940 8.23e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4995 on 8669 degrees of freedom
## Multiple R-squared:  0.001787,   Adjusted R-squared:  0.001672 
## F-statistic: 15.52 on 1 and 8669 DF,  p-value: 8.227e-05

4.1.20 Anime

## 
## Call:
## lm(formula = answer ~ is_us, data = matched.df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.17450 -0.07897 -0.07897 -0.07897  0.92103 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.17450    0.02228   7.831 5.40e-15 ***
## is_us1      -0.09552    0.02248  -4.250 2.16e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.272 on 8669 degrees of freedom
## Multiple R-squared:  0.002079,   Adjusted R-squared:  0.001964 
## F-statistic: 18.06 on 1 and 8669 DF,  p-value: 2.16e-05

4.1.21 Hand drawn

## 
## Call:
## lm(formula = answer ~ is_us, data = matched.df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.14094 -0.09939 -0.09939 -0.09939  0.90061 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.14094    0.02459   5.732 1.02e-08 ***
## is_us1      -0.04155    0.02480  -1.675   0.0939 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3001 on 8669 degrees of freedom
## Multiple R-squared:  0.0003237,  Adjusted R-squared:  0.0002083 
## F-statistic: 2.807 on 1 and 8669 DF,  p-value: 0.09391

4.1.22 Stylized

## 
## Call:
## lm(formula = answer ~ is_us, data = matched.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.5394 -0.5394  0.4606  0.4606  0.4832 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.51678    0.04084   12.65   <2e-16 ***
## is_us1       0.02265    0.04120    0.55    0.582    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4985 on 8669 degrees of freedom
## Multiple R-squared:  3.487e-05,  Adjusted R-squared:  -8.048e-05 
## F-statistic: 0.3023 on 1 and 8669 DF,  p-value: 0.5825

4.1.23 Motivation: Action

## 
## Call:
## lm(formula = answer ~ is_us, data = matched.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.5419 -0.5419  0.4581  0.4581  0.6067 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.39333    0.04067   9.671  < 2e-16 ***
## is_us1       0.14860    0.04103   3.622 0.000294 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4981 on 8649 degrees of freedom
## Multiple R-squared:  0.001514,   Adjusted R-squared:  0.001399 
## F-statistic: 13.12 on 1 and 8649 DF,  p-value: 0.0002941

4.1.24 Motivation: Social

## 
## Call:
## lm(formula = answer ~ is_us, data = matched.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.4985 -0.4985 -0.3467  0.5015  0.6533 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.34667    0.04080   8.498  < 2e-16 ***
## is_us1       0.15186    0.04115   3.690 0.000226 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4996 on 8649 degrees of freedom
## Multiple R-squared:  0.001572,   Adjusted R-squared:  0.001456 
## F-statistic: 13.62 on 1 and 8649 DF,  p-value: 0.0002256

4.1.25 Motivation: Mastery

## 
## Call:
## lm(formula = answer ~ is_us, data = matched.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.4607 -0.4607 -0.4607  0.5393  0.5800 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.42000    0.04070   10.32   <2e-16 ***
## is_us1       0.04065    0.04105    0.99    0.322    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4984 on 8649 degrees of freedom
## Multiple R-squared:  0.0001134,  Adjusted R-squared:  -2.252e-06 
## F-statistic: 0.9805 on 1 and 8649 DF,  p-value: 0.3221

4.1.26 Motivation: Achievement

## 
## Call:
## lm(formula = answer ~ is_us, data = matched.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.5292 -0.5292  0.4708  0.4708  0.5000 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.50000    0.04076  12.267   <2e-16 ***
## is_us1       0.02923    0.04112   0.711    0.477    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4992 on 8649 degrees of freedom
## Multiple R-squared:  5.843e-05,  Adjusted R-squared:  -5.718e-05 
## F-statistic: 0.5054 on 1 and 8649 DF,  p-value: 0.4772

4.1.27 Motivation: Immersion

## 
## Call:
## lm(formula = answer ~ is_us, data = matched.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.4948 -0.4948 -0.3733  0.5052  0.6267 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.37333    0.04080   9.149  < 2e-16 ***
## is_us1       0.12143    0.04116   2.950  0.00319 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4998 on 8649 degrees of freedom
## Multiple R-squared:  0.001005,   Adjusted R-squared:  0.0008897 
## F-statistic: 8.703 on 1 and 8649 DF,  p-value: 0.003186

4.1.28 Motivation: Creativity

## 
## Call:
## lm(formula = answer ~ is_us, data = matched.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.4567 -0.4567 -0.4567  0.5433  0.5733 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.42667    0.04067  10.491   <2e-16 ***
## is_us1       0.02999    0.04103   0.731    0.465    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4981 on 8649 degrees of freedom
## Multiple R-squared:  6.175e-05,  Adjusted R-squared:  -5.386e-05 
## F-statistic: 0.5341 on 1 and 8649 DF,  p-value: 0.4649

4.2 To predict the probability of having a game label in a certain country, we use multilevel regression and post-stratification (MRP).

# Import libraries
source("read.R")
library(ggalt)
library(reshape2)

options(dplyr.summarise.inform = FALSE)

In this subsection, we try to predict a probability of labeling a certain game with a particular Genome label by different countries.

# read csv
countryLevelDf <- read_csv("demographic.csv")
## Rows: 7167 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): Country, AgeGroup, Gender
## dbl (1): UserCount
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
countryLevelDf$UserCount <- as.numeric(countryLevelDf$UserCount)

# make the country name consistent with the individual survey dataframe
countryLevelDf <- countryLevelDf %>% 
  filter(Country %in% c("Argentina", "Brazil", "Chile", "Colombia", "Germany", "Greece", "India", "Japan", "Korea", "Mexico", "Nigeria", "Poland", "Saudi Arabia", "Singapore", "South Africa", "United States")) %>%
  dplyr::rename(ageCut = AgeGroup, country = Country, gender = Gender) %>%
  mutate(ageCut = case_when(ageCut == "18 - 24" ~ "18-24",
                                 ageCut == "25 - 34" ~ "25-34",
                                 ageCut == "35 - 44" ~ "35-44",
                                 ageCut == "45 - 55" ~ "45-55",
                                 ageCut == "> 55" ~ "55+"))
countryLevelDf$country[countryLevelDf$country == "United States"] <- "US"
countryLevelDf$country[countryLevelDf$country == "Saudi Arabia"] <- "Saudi.Arabia"
countryLevelDf$country[countryLevelDf$country == "South Africa"] <- "South.Africa"

countryLevelDf <- countryLevelDf %>% mutate(region = case_when(country %in% c("Japan", "Korea") ~ "Confucian",
                            country %in% c("Singapore", "India") ~ "West.South.Asia",
                            country %in% c("Argentina", "Chile", "Colombia", "Mexico", "Brazil") ~ "Latin.America",
                            country %in% c("Germany") ~ "Protestant.Europe",
                            country %in% c("South.Africa", "Nigeria", "Saudi.Arabia") ~ "Islamic",
                            country %in% c("Poland") ~ "Catholic.Europe",
                            country %in% c("Greece") ~ "Orthodox.Europe",
                            country %in% c("US") ~ "US"))

countryLevelDf$country <- factor(countryLevelDf$country)
countryLevelDf$ageCut <- factor(countryLevelDf$ageCut, c("18-24","25-34", "35-44", "45-55", "55+"))
countryLevelDf$gender <- factor(countryLevelDf$gender)
countryLevelDf$region = factor(countryLevelDf$region)
# Calculate the percentage of age+gender group by country
countryLevelDf <- countryLevelDf %>% group_by(country) %>% 
  mutate(UserCount.per = UserCount/sum(UserCount, na.rm=TRUE))
countryLevelDf <- countryLevelDf %>% filter(ageCut != "" & ageCut != "Unknown" & !is.na(ageCut)) %>%
  filter(gender != "" & gender != "Unknown" & !is.na(gender)) %>%
  mutate(gender = ifelse(gender == "Male", "Male", "Female or Nonbinary"))

4.3 Exploratory Data Analysis

We examine the difference between our sample of ~5000 participants and the true MS population.

source("read.R")
df <- df %>% mutate(gender = ifelse(gender == "Male", "Male", "Female or Nonbinary"))
age_sample <- df %>%  
  dplyr::mutate(age = factor(ageCut, ordered = FALSE)) %>% 
  dplyr::group_by(age) %>% 
  dplyr::summarise(n = n()) %>% 
  dplyr::mutate(Sample = n/sum(n))

age_post <- countryLevelDf %>% 
  dplyr::mutate(age = factor(ageCut, ordered = FALSE)) %>% 
  dplyr::group_by(age) %>% 
  dplyr::summarise(n_post = sum(UserCount)) %>% 
  dplyr::mutate(Population = n_post/sum(n_post))
age <- dplyr::inner_join(age_sample, age_post, by = "age") %>% select(age, Sample, Population)

age_plot <- ggplot() + 
  ylab("") + xlab("Proportion") + theme_bw() + coord_flip()  + 
  geom_dumbbell(data = age, aes(y = age, x = Sample, xend = Population)) +
  geom_point(data = melt(age, id = "age"), aes(y = age, x = value, color = variable), size = 2) +
  scale_x_continuous(limits = c(0, 0.5), breaks = c(0, .1, .2, .3, .4, .5)) + 
  ggtitle("Age")
age_plot

# Gender because the poststratification dataset does not have nonbinary 
male_sample <- df %>% 
  dplyr::mutate(gender = ifelse(gender != "Male", "Female or Nonbinary", "Male")) %>%
  dplyr::group_by(gender) %>% 
  dplyr::summarise(n = n()) %>% 
  dplyr::mutate(Sample = n/sum(n))
male_post <- countryLevelDf %>% 
  dplyr::group_by(gender) %>% 
  dplyr::summarise(n_post = sum(UserCount)) %>% 
  dplyr::mutate(Population = n_post/sum(n_post))
male <- dplyr::inner_join(male_sample, male_post, by = "gender") %>% select(gender, Sample, Population)
male_plot <- ggplot() + 
  ylab("") + xlab("") + theme_bw() + coord_flip()  + 
  geom_dumbbell(data = male, aes(y = gender, x = Sample, xend = Population)) +
  geom_point(data = melt(male, id = "gender"), aes(y = gender, x = value, color = variable), size = 2) +
  scale_x_continuous(limits = c(0, 1), breaks = c(0, .2, .4, .6, .8, 1.0)) + ggtitle("Gender")
male_plot

4.3.1 Poststratification for country

  • Here we further show that for the US, it is way higher likely that these games such as PUBG, Elden Ring, GTA, Call of Duty to be violent.

4.3.1.1 TODO: Add the hardcore percentage before merging the table

library(data.table)
poststratify <- function(game, label) {
  temp <- df %>% filter(label == {{ label }} & game == {{ game }}) %>% 
    filter(accessibility %in% c("Yes", "No")) %>% 
    mutate(gender = ifelse(gender == "Male", "Male", "Female or Nonbinary")) %>%
    na.omit()

  mod <- glm(answer ~ hardcore + accessibility + country + gender + ageCut, data = temp, family = binomial(link="logit"))
  post <- expand.grid(accessibility=unique(temp$accessibility), 
                      country=unique(temp$country), 
                      gender=unique(temp$gender), 
                      ageCut=unique(temp$ageCut),
                      hardcore=unique(temp$hardcore))

  post <- post %>% 
    mutate(gender = ifelse(gender != "Male", "Female", "Male")) %>% 
    left_join(countryLevelDf, by=c("country", "gender", "ageCut")) %>%
    mutate(gender = ifelse(gender != "Male", "Female or Nonbinary", "Male"))
  
  post <- post %>% 
    mutate(UserCount.per = ifelse(accessibility == "No", UserCount.per * 0.1, UserCount.per * 0.9))  # heuristic about accessibility 
    
  post$prediction <- predict(mod, newdata=post, type="response", allow.new.levels=TRUE)
  post$weight.pred <- post$prediction*post$UserCount.per 
  results <- data.table(post)[ , .(final.est = sum(weight.pred, na.rm=TRUE)), by = .(country)]
  
  return(results)
}
labelLst = c("pacifist", "made.for.kids", "cozy", "zen", "fantasy", "space", "heroic", "real.world", "violent", 
             "action", "emotional", "comedic", "experimental", "strategy", "grinding", "anime", "hand.drawn", 
             "stylized", "action.motivation", "social", "mastery", "achievement", "immersion", "creativity")
gameLst <- unique(df$game)
predicted.df <- data.frame(country=c(), final.est=c(), label=c(), game = c())
# Calculate the poststratified probability
for(g in gameLst) {
  for(l in labelLst) {
    label.df <- poststratify(g, l)
    label.df$label <- l
    label.df$game <- g
    predicted.df <- rbind(predicted.df, label.df)
  }
}
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
# View the countries with a probability > 0.75
predicted.df %>% filter(final.est > 0.5 & label == "violent")
##          country final.est   label                   game
##  1:           US 0.7451460 violent                   PUBG
##  2:       Greece 0.5973012 violent                   PUBG
##  3:        Japan 0.5699150 violent                   PUBG
##  4:       Mexico 0.5149857 violent                   PUBG
##  5:       Poland 0.5240003 violent                   PUBG
##  6:           US 0.8926281 violent     Grand Theft Auto V
##  7:    Singapore 0.6421735 violent     Grand Theft Auto V
##  8: South.Africa 0.5146014 violent     Grand Theft Auto V
##  9:       Brazil 0.5838548 violent     Grand Theft Auto V
## 10:        Chile 0.5220274 violent     Grand Theft Auto V
## 11:     Colombia 0.5268495 violent     Grand Theft Auto V
## 12:      Germany 0.7017908 violent     Grand Theft Auto V
## 13:       Greece 0.7147602 violent     Grand Theft Auto V
## 14:        Japan 0.5895735 violent     Grand Theft Auto V
## 15:        Korea 0.6300997 violent     Grand Theft Auto V
## 16:       Mexico 0.6231587 violent     Grand Theft Auto V
## 17:       Poland 0.5453907 violent     Grand Theft Auto V
## 18:           US 0.8339568 violent             Elden Ring
## 19:    Singapore 0.5073604 violent             Elden Ring
## 20:      Germany 0.5004755 violent             Elden Ring
## 21:       Greece 0.5778392 violent             Elden Ring
## 22:       Poland 0.5420260 violent             Elden Ring
## 23:           US 0.6905598 violent Call of Duty: Vanguard
## 24:      Germany 0.5635139 violent Call of Duty: Vanguard
## 25:       Greece 0.5231360 violent Call of Duty: Vanguard
##          country final.est   label                   game

4.3.2 Alternative approach using rstanarm

https://bookdown.org/jl5522/MRP-case-studies/introduction-to-mrp.html

# res <- df %>% filter(label == "made.for.kids") %>%
#   filter(!label %in% c("NA.positive.opinion", "NA.negative.opinion", "NA.feeling", "NA.art")) %>%
#   filter(!is.na(country) & !is.na(gender) & !is.na(Language) & !is.na(ageCut)) %>%
#   mutate(male = ifelse(gender == "Male", 1, 0)) %>% 
#   mutate(male = as.factor(male)) %>%
#   select(Response.ID, country, male, ageCut, region, answer)
# library(rstanarm)
# fit <- stan_glmer(answer ~  (1|country) + (1|ageCut) + male + region,
#   family = binomial(link = "logit"),
#   data = res,
#   prior = normal(0, 1, autoscale = TRUE),
#   prior_covariance = decov(scale = 0.50),
#   adapt_delta = 0.99,
#   refresh = 0,
#   seed = 1010)

5 Figure 3: If label outcome differences are found across countries, is there a data-backed explanation relating to culture?

To answer this question, we use the Hofstede’s theory of cultural dimensions. This question answers whether culturally “closer” countries have more similar survey results.

  • Define ‘culture’ using Hofstede’s cultural dimension map and calculate cultural similarity scores between pairs of countries
  • Calculate correlations between sampled survey responses for individuals across pairs of countries
library(tidyverse)
library(tidytext)
library(data.table)
source("read.R")

theme_set(
  theme_bw() + 
    theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1, size=8))
)
labels <- unique(df$label)
# Calculate correlations between sampled survey responses for individuals across pairs of countries
rnormt <- function(n, range, mu, s) {
  # range is a vector of two values
  F.a <- pnorm(min(range), mean = mu, sd = s)
  F.b <- pnorm(max(range), mean = mu, sd = s)
  
  u <- runif(n, min = F.a, max = F.b)
  return(qnorm(u, mean = mu, sd = s))
}

countryCorr <- df %>% select(Response.ID, game, country, label, answer)
# Responses that are lower than 5 participants per game 
summary <- countryCorr %>% filter(label == "learning_curve") %>%  
  group_by(country, game) %>%
  summarise(n=n(), mean=mean(answer), sd=sd(answer)) %>%
  ungroup() %>% 
  filter(n < 5)

# games that we will ignore altogether
gameTooFew <- unique(summary$game)
gameTooFew
## [1] "The Sims 3"                    "Animal Crossing: New Horizons"
## [3] "Mario Kart 8"                  "Stardew Valley"               
## [5] "Candy Crush"
# Compare "label space" vector
updateVector <- function(inputCountry, inputrnormtA, inputLabel, lower=0, upper=1) {
  # update the vector of a country based on a specific label
  # Args: 
  #   country: string of countries
  #   label: string of the label of interest
  #   lower, upper: the lower and upper bound (0-1 for binary, 0-3 for control_complexity etc, 0-4 for difficulty)
  # Returns:
  #   an updated vector in the label space
  inputLabel
  inputCountry
  if(upper - lower == 0) {
    print("Something happened")
  }
  
  dfA <- df %>% filter(label == {{ inputLabel }}) %>%
    mutate(answer = (1-0) * (answer-lower) / (upper-lower) + 0) %>%  # the filter and mutate_at order is important and cannot be switched
    filter(country == {{ inputCountry }}) %>% 
    filter(!game %in% gameTooFew) %>%
    group_by(game) %>%
    dplyr::summarise(mean=mean(answer, na.rm=TRUE), sd=sd(answer, na.rm=TRUE)) 
  
  
  rnormtA <- inputrnormtA
  for(row in 1:nrow(dfA)) {
    meanA <- as.double(dfA[row, "mean"])
    sdA <- as.double(dfA[row, "sd"])
    
    if(sdA == 0) {
      gameA <- rep(meanA, 1000)
    } else {
      gameA <- rnormt(1000, c(0, 1), meanA, sdA) 
    }
    rnormtA <- c(rnormtA, gameA)
  }
  
  return(rnormtA)
}

labelSpace <- data.frame(matrix(ncol = 0, nrow = 28*600))
countries <- unique(df$country)
labels <- c("control_complexity", "learning_curve", "difficulty", "replayability", "pacifist", "made.for.kids", "cozy", "zen", "fantasy", "space", "heroic", "real.world", "violent", 
             "action", "emotional", "comedic", "experimental", "strategy", "grinding", "anime", "hand.drawn", 
             "stylized", "action.motivation", "social", "mastery", "achievement", "immersion", "creativity")
for(country in countries) {
  v <- c()
  for(label in labels) {
   
    if(label %in% c("control_complexity", "learning_curve", "replayability")) {
      v <- updateVector(country, v, label, lower=1, upper=3)
    } else if (label == "difficulty") {
      v <- updateVector(country, v, label, lower=1, upper=4)
    } else {
      v <- updateVector(country, v, label, lower=0, upper=1)
    }
  }
  labelSpace <- cbind(labelSpace, as.data.frame(v))
  names(labelSpace)[names(labelSpace) == "v"] <- country
}

library(corrplot)
mx <- cor(labelSpace, use = "complete.obs")
corrplot(mx, type = "upper", order = "FPC", tl.cex=2,
         tl.col = "black", tl.srt = 45)

# Plot label correlation against the Hofstede scores
mx[upper.tri(mx, diag = TRUE)] <- NA
rownames(mx) <- colnames(mx) 
mx <- na.omit(reshape::melt(t(mx)))
mx <- mx[ order(mx$X1, mx$X2), ]
labelCorr <- mx %>% select(X1, X2, value)
labelCorr <- labelCorr %>% dplyr::rename(countryA=X1, countryB=X2, label.corr=value)
labelCorr$countryA <- as.character(labelCorr$countryA)
labelCorr$countryB <- as.character(labelCorr$countryB)
labelCorr <- transform(labelCorr, countryA = pmin(countryA, countryB), countryB=pmax(countryA, countryB))
head(labelCorr)
##        countryA countryB label.corr
## 17        India       US  0.3050486
## 33    Singapore       US  0.3081656
## 49 South.Africa       US  0.3761590
## 65      Nigeria       US  0.2540979
## 81    Argentina       US  0.3483207
## 97       Brazil       US  0.3691277
# 1. Find the pairwise correlation between countries based on the Hofstede's framework
# correlation based on six metrics: power.distance, individualism, masculinity, uncertainty
# long.term.orientation, and indulgence
library(reshape)
hofstedeDf <- read.csv("hoftstede.csv")
hofstedeDf <- hofstedeDf %>% select(-X) %>%
  mutate(power.distance = power.distance / 100,
         individualism = individualism / 100,
         masculinity = masculinity / 100,
         uncertainty = uncertainty / 100, 
         long.term.orientation = long.term.orientation / 100,
         indulgence = indulgence / 100)  # scale the hofstede score to 0 - 1
corrMatrix <- cor(t(hofstedeDf[, -1]))  # remove the first column with countries

corrMatrix[upper.tri(corrMatrix, diag = TRUE)] <- NA
rownames(corrMatrix) <- colnames(corrMatrix) <- hofstedeDf$country
corrMatrix <- na.omit(reshape::melt(t(corrMatrix)))
corrMatrix <- corrMatrix[ order(corrMatrix$X1, corrMatrix$X2), ]
hofstedeCorr <- corrMatrix %>% select(X1, X2, value) %>% dplyr::rename(countryA=X1, countryB=X2, hofstede.corr=value)
hofstedeCorr$countryA <- as.character(hofstedeCorr$countryA)
hofstedeCorr$countryB <- as.character(hofstedeCorr$countryB)
hofstedeCorr <- transform(hofstedeCorr, countryA = pmin(countryA, countryB), countryB=pmax(countryA, countryB))
head(hofstedeCorr)
##     countryA     countryB hofstede.corr
## 17     India    Singapore    0.44105595
## 33     India           US   -0.40050181
## 49     India      Nigeria   -0.06628162
## 65     India South.Africa   -0.48927712
## 81 Argentina        India   -0.35943139
## 97    Brazil        India    0.18844511
  • This finding shows that there is no clear trend in the correlation between Hofstede’s dimensions of culture and the label correlations. It seems that the label correlation is roughly the same across these countries. In other words, the Hofstede’s dimensions of culture, which was observed in organizational culture at IBM, doesn’t seem to explain the difference well.
# Now plotting the two correlations 
combined <- hofstedeCorr %>% left_join(labelCorr, by=c("countryA", "countryB")) 

ggplot(aes(x=hofstede.corr, y=label.corr), data=combined) +
  geom_point() +
  ylim(c(0,1)) +
  geom_smooth(method='lm')
## `geom_smooth()` using formula 'y ~ x'

The following takes a similar approach, but looks at the individual game level.

# # Construct the matrix: 6000 * 16 (1000 simulation * 6 games) * 16 countries
# compareCountry <- function(countryA, countryB, label, lower=0, upper=1) {
#   # compare the label scoring between two countries 
#   # Args: 
#   #   countryA, countryB: string of countries
#   #   label: string of the label of interest
#   #   lower, upper: the lower and upper bound (0-1 for binary, 0-3 for control_complexity etc, 0-4 for difficulty)
#   # Returns:
#   #   a dataframe that contains country A and country B, pearson correlation estimate, p.value, as well as the label
#   dfA <- df %>% filter(label == {{ label }}) %>%
#     mutate(answer = (1-0) * (answer-lower) / (upper-lower) + 0) %>% # the filter and mutate_at order is important and cannot be switched
#     filter(country == {{ countryA }}) %>% 
#     filter(!game %in% gameTooFew) %>%
#     group_by(game) %>%
#     dplyr::summarise(mean=mean(answer, na.rm=TRUE), sd=sd(answer, na.rm=TRUE)) 
#   dfB <- df %>% filter(label == {{ label }}) %>%
#     mutate(answer = (1-0) * (answer-lower) / (upper-lower) + 0) %>%
#     filter(country == {{ countryB }}) %>% 
#     filter(!game %in% gameTooFew) %>%
#     group_by(game) %>%
#     dplyr::summarise(mean=mean(answer, na.rm=TRUE), sd=sd(answer, na.rm=TRUE)) 
#   
#   rnormtA <- c()
#   rnormtB <- c()
#   for(row in 1:nrow(dfA)) {
#     meanA <- as.double(dfA[row, "mean"])
#     meanB <- as.double(dfB[row, "mean"])
#     sdA <- as.double(dfA[row, "sd"])
#     sdB <- as.double(dfB[row, "sd"])
#     
#     gameA <- rnormt(1000, c(lower, upper), meanA, sdA)
#     gameB <- rnormt(1000, c(lower, upper), meanB, sdB)
#     if(any(is.na(gameA)) | any(is.na(gameB))) next;
#     rnormtA <- c(rnormtA, gameA)
#     rnormtB <- c(rnormtB, gameB)
#   }
#   
#   cor.res = cor.test(rnormtA, rnormtB, method=c("pearson"), use = "complete.obs")
# 
#   return(data.frame(countryA = countryA, countryB = countryB, estimate=cor.res$estimate, p=cor.res$p.value, label=label))
# }
# 
# matrix <- combn(unique(df$country), 2)  # get the combination of countries
# new.df <- data.frame(countryA=c(), countryB=c(), estimate=c(), p=c(), label=c())
# 
# for(col in 1:ncol(matrix)) {
#   # for each combination (English vs. language X
#   cA <- matrix[1, col]
#   cB <- matrix[2, col]
#   
#   for(label in c("control_complexity", "learning_curve", "replayability")) {
#     comparison <- compareCountry(cA, cB, label, lower=1, upper=3)
#     new.df <- rbind(new.df, comparison)
#   }
#   comparison <- compareCountry(cA, cB, "difficulty", lower=1, upper=4)
#   new.df <- rbind(new.df, comparison)
#   
#   for(label in labels[!labels %in% c("control_complexity", "learning_curve", "difficulty", "replayability")]){
#     comparison <- compareCountry(cA, cB, label, lower=0, upper=1)
#     new.df <- rbind(new.df, comparison) # update the new.df
#   }
# }
# 
# # Here we look at labels with high positive or negative correlations
# new.df %>% filter(estimate > 0.1)

6 Figure 4: How similar are label choices for bilingual speakers when asked questions in their native language versus English?

library(tidyverse)
library(tidytext)
library(data.table)
source("read.R")

theme_set(
  theme_bw() + 
    theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1, size=8))
)

df <- df %>% filter(!label %in% c("NA.positive.opinion", "NA.negative.opinion", "NA.feeling", "NA.art"))

We ran our surveys in both English and non-English languages on bilingual speakers within each non-English speaking country.

rnormt <- function(n, range, mu, s) {
  if(s == 0) {
    return(c(NaN))
  }
  # range is a vector of two values
  F.a <- pnorm(min(range), mean = mu, sd = s)
  F.b <- pnorm(max(range), mean = mu, sd = s)
  
  u <- runif(n, min = F.a, max = F.b)
  return(qnorm(u, mean = mu, sd = s))
}
# For some weird reason, I need to output the first four variable
compareLanguage <- function(languageA, languageB, inputCountries, label, lower=0, upper=1) {
  languageA 
  languageB
  inputCountries
  label
  dfA = df %>% dplyr::filter(label == {{ label }}) %>%  # select the label
    mutate(answer = (1-0) * (answer-lower) / (upper-lower) + 0) %>%  # rescale the Likert scale answer to 0-1 (e.g., 1-3, 1-4 to 0-1)
    dplyr::filter(Language == {{ languageA }} & country %in% inputCountries) %>% 
    group_by(game) %>%
    summarise(mean=mean(answer, na.rm=TRUE), sd=sd(answer, na.rm=TRUE)) %>%
    filter(!is.na(sd))  # in case there is only one response in a country
  
  # same operation for languageB
  dfB = df %>% dplyr::filter(label == {{ label }}) %>%  # select the label
    mutate(answer = (1-0) * (answer-lower) / (upper-lower) + 0) %>%  # rescale the Likert scale answer to 0-1 (e.g., 1-3, 1-4 to 0-1)
    dplyr::filter(Language == {{ languageB }} & country %in% inputCountries) %>% 
    group_by(game) %>%
    summarise(mean=mean(answer, na.rm=TRUE), sd=sd(answer, na.rm=TRUE)) %>%
    filter(!is.na(sd))  # in case there is only one response in a country

  rnormtA <- c()
  rnormtB <- c()

  for(row in unique(df$game)) {
    meanA <- as.double(dfA[dfA$game == row, "mean"])
    meanB <- as.double(dfB[dfB$game == row, "mean"])
    sdA <- as.double(dfA[dfA$game == row, "sd"])
    sdB <- as.double(dfB[dfB$game == row, "sd"])

    if(any(is.na(meanA)) | any(is.na(meanB))) next;
    if(any(is.na(sdA)) | any(is.na(sdB))) next;
    if(any(sdA == 0) | any(sdB == 0)) next;

    gameA <- rnormt(1000, c(0, 1), meanA, sdA)
    gameB <- rnormt(1000, c(0, 1), meanB, sdB)
    rnormtA <- c(rnormtA, gameA)
    rnormtB <- c(rnormtB, gameB)
  }
  
  # no game is available
  if(length(rnormtA) == 0) {
    return(data.frame(languageA = languageA, languageB = languageB, estimate=NaN, p=NaN, label=label))
  }

  cor.res = cor.test(rnormtA, rnormtB, method=c("pearson"), use = "complete.obs")
  return(data.frame(languageA = languageA, languageB = languageB, estimate=cor.res$estimate, p=cor.res$p.value, label=label))
}
matrix <- combn(unique(df$Language), 2)  # get the combination of languages
new.df <- data.frame(languageA=c(), languageB=c(), estimate=c(), p=c(), label=c())
for(col in 1:ncol(matrix)) {
  # for each combination (English vs. language X
  l1 <- matrix[1, col]  # English
  if(l1 != "English") break
  l2 <- matrix[2, col]  # language X
  language.countries <- df %>% filter(Language == l2)  # find the correpondings countries to language X
  
  for(label in c("control_complexity", "learning_curve", "replayability")) {
    comparison <- compareLanguage(l1, l2, c(unique(language.countries$country)), label, lower=1, upper=3)
    new.df <- rbind(new.df, comparison)
  }

  comparison <- compareLanguage(l1, l2, c(unique(language.countries$country)), "difficulty", lower=1, upper=4)
  new.df <- rbind(new.df, comparison)

  for(label in labels[!labels %in% c("control_complexity", "learning_curve", "difficulty", "replayability")]){
    comparison <- compareLanguage(l1, l2, c(unique(language.countries$country)), label, lower=0, upper=1)
    new.df <- rbind(new.df, comparison)  # update the new.df
  }
}
  • This result looks very cool as it indicates that the current translation of tags might be leaning toward popular markets. Japanese, Arabic, Greek markets are poorly translated.
new.df %>% filter(estimate < 0)
##        languageA languageB      estimate            p         label
## cor88    English     Greek -3.572007e-02 0.0504324223      pacifist
## cor90    English     Greek -2.236068e-02 0.0613827682          cozy
## cor94    English     Greek -2.098130e-03 0.8820876717    real.world
## cor107   English     Greek -1.080524e-02 0.4026935824       mastery
## cor113   English  Japanese -1.382350e-03 0.9303534221 replayability
## cor115   English  Japanese -3.557793e-02 0.2610038057      pacifist
## cor119   English  Japanese -3.656037e-02 0.2480574282       fantasy
## cor120   English  Japanese -7.831736e-02 0.0004554858        heroic
## cor123   English  Japanese -1.854424e-02 0.3099275849        action
## cor124   English  Japanese -7.554202e-02 0.0168816502     emotional
## cor154   English    Polish -2.320086e-02 0.0723358633      grinding
## cor155   English    Polish -2.690210e-03 0.8829049323    hand.drawn
## cor158   English    Polish -3.096397e-05 0.9980867074        social
## cor166   English    Arabic -4.337230e-02 0.1705364952    difficulty
## cor167   English    Arabic -1.732631e-02 0.5841988618       fantasy
## cor169   English    Arabic -1.251937e-02 0.6925348295       violent
## cor173   English    Arabic -4.157085e-02 0.1890113883   achievement